通过使用预训练模型的转移学习已成为机器学习社区的增长趋势。因此,在线发布了许多预培训模型,以促进进一步的研究。但是,它引起了人们对这些预训练模型是否会泄露其培训数据的隐私敏感信息的广泛担忧。因此,在这项工作中,我们的目标是回答以下问题:“我们可以有效地从这些预训练的模型中恢复私人信息吗?检索这种敏感信息的足够条件是什么?”我们首先探索不同的统计信息,这些信息可以将私人培训分布与其他分布区分开。根据我们的观察,我们提出了一个新颖的私人数据重建框架Secretgen,以有效地恢复私人信息。与以前可以恢复私人数据的方法与目标恢复实例的真实预测相比,SecretGen不需要此类先验知识,从而使其更加实用。我们在各种情况下对不同数据集进行了广泛的实验,以将Secretgen与其他基线进行比较,并提供系统的基准,以更好地了解不同的辅助信息和优化操作的影响。我们表明,如果没有关于真实班级预测的先验知识,SecretGen能够与利用此类先验知识的私人数据相比恢复具有相似性能的私人数据。如果给出了先验知识,SecretGen将显着优于基线方法。我们还提出了几个定量指标,以进一步量化预培训模型的隐私脆弱性,这将有助于对对隐私敏感应用程序的模型选择。我们的代码可在以下网址提供:https://github.com/ai-secure/secretgen。
translated by 谷歌翻译
执行单个图像整体理解和3D重建是计算机视觉中的核心任务。本文介绍了从单个RGB图像的室内和室外场景执行整体图像分段,对象检测,实例分段,深度估计和对象实例3D重建。我们命名我们的系统Panoptic 3D解析,其中Panoptic Segsation(“填写”分割和“检测/分割”的“检测/分割”。我们设计了一个舞台明智的系统,其中不存在一整套注释。此外,我们介绍了一个端到端的管道,在合成数据集上培训,具有全套注释。我们在室内(3D-Flact)和户外(可可和城市)的场景上显示结果。我们提出的Panoptic 3D解析框架指向计算机愿景中有希望的方向。它可以应用于各种应用,包括自主驾驶,映射,机器人,设计,计算机图形学,机器人,人机互动和增强现实。
translated by 谷歌翻译
生成模型的面部匿名化已经变得越来越普遍,因为它们通过生成虚拟面部图像来消毒私人信息,从而确保隐私和图像实用程序。在删除或保护原始身份后,通常无法识别此类虚拟面部图像。在本文中,我们将生成可识别的虚拟面部图像的问题形式化和解决。我们的虚拟脸部图像在视觉上与原始图像不同,以保护隐私保护。此外,它们具有新的虚拟身份,可直接用于面部识别。我们建议可识别的虚拟面部发电机(IVFG)生成虚拟面部图像。 IVFG根据用户特定的键将原始面部图像的潜在矢量投射到虚拟图像中,该键基于该图像生成虚拟面部图像。为了使虚拟面部图像可识别,我们提出了一个多任务学习目标以及一个三联生的培训策略,以学习IVFG。我们使用不同面部图像数据集上的不同面部识别器评估虚拟面部图像的性能,所有这些都证明了IVFG在生成可识别的虚拟面部图像中的有效性。
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial. Multimodal learning is a popular solution for automatic diagnosis of depression, and the existing works suffer two main drawbacks: 1) the high-order interactions between different modalities can not be well exploited; and 2) interpretability of the models are weak. To remedy these drawbacks, we propose a multimodal multi-order factor fusion (MMFF) method. Our method can well exploit the high-order interactions between different modalities by extracting and assembling modality factors under the guide of a shared latent proxy. We conduct extensive experiments on two recent and popular datasets, E-DAIC-WOZ and CMDC, and the results show that our method achieve significantly better performance compared with other existing approaches. Besides, by analyzing the process of factor assembly, our model can intuitively show the contribution of each factor. This helps us understand the fusion mechanism.
translated by 谷歌翻译